Skip to main content
Log in

Conditional heavy-tail behavior with applications to precipitation and river flow extremes

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

This article deals with the right-tail behavior of a response distribution \(F_Y\) conditional on a regressor vector \({\mathbf {X}}={\mathbf {x}}\) restricted to the heavy-tailed case of Pareto-type conditional distributions \(F_Y(y|\ {\mathbf {x}})=P(Y\le y|\ {\mathbf {X}}={\mathbf {x}})\), with heaviness of the right tail characterized by the conditional extreme value index \(\gamma ({\mathbf {x}})>0\). We particularly focus on testing the hypothesis \({\mathscr {H}}_{0,tail}:\ \gamma ({\mathbf {x}})=\gamma _0\) of constant tail behavior for some \(\gamma _0>0\) and all possible \({\mathbf {x}}\). When considering \({\mathbf {x}}\) as a time index, the term trend analysis is commonly used. In the recent past several such trend analyses in extreme value data have been published, mostly focusing on time-varying modeling of location or scale parameters of the response distribution. In many such environmental studies a simple test against trend based on Kendall’s tau statistic is applied. This test is powerful when the center of the conditional distribution \(F_Y(y|{\mathbf {x}})\) changes monotonically in \({\mathbf {x}}\), for instance, in a simple location model \(\mu ({\mathbf {x}})=\mu _0+x\cdot \mu _1\), \({\mathbf {x}}=(1,x)'\), but the test is rather insensitive against monotonic tail behavior, say, \(\gamma ({\mathbf {x}})=\eta _0+x\cdot \eta _1\). This has to be considered, since for many environmental applications the main interest is on the tail rather than the center of a distribution. Our work is motivated by this problem and it is our goal to demonstrate the opportunities and the limits of detecting and estimating non-constant conditional heavy-tail behavior with regard to applications from hydrology. We present and compare four different procedures by simulations and illustrate our findings on real data from hydrology: weekly maxima of hourly precipitation from France and monthly maximal river flows from Germany.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

Download references

Acknowledgements

We would like to thank Professor Andreas Schumann from the Department of Civil Engineering, Ruhr-University Bochume, Germany, for providing us hydrological data and for helpful discussions. We are also grateful to two anonymous referees and an Associate Editor for their constructive comments on an earlier version of our work. The financial support of the Deutsche Forschungsgemeinschaft (SFB 823, “Statistical modelling of nonlinear dynamic processes”) is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul Kinsvater.

Appendices

Appendix: Quantile regression process

Let Y denote a random variable called response and \({\mathbf {X}}=(1,X_1,\ldots ,X_d)'\) a random vector called regressor with support covered by a compact set \({\mathscr {X}}\subset {\mathbb {R}}^{d+1}\). Throughout this section we suppose that the conditional distribution \(F(y|{\mathbf {x}})=P(Y\le y|\,{\mathbf {X}}={\mathbf {x}})\) of Y given \({\mathbf {X}}={\mathbf {x}}\) satisfies

$$F^{-1}(p|\,{\mathbf {x}})=\inf \{y:\,F(y|{\mathbf {x}})\ge p\}={\mathbf {\varvec{}x}}'{\varvec{\beta}} {}_p$$
(13)

for all \({\mathbf {x}}\in {\mathscr {X}}\), probabilities \(p\in I\subset [\varepsilon ,1-\varepsilon ]\) and an unknown vector-valued function \(p\mapsto {\varvec{\beta }}_p\), \(p\in I\), with \({\varvec{\beta }}_p\in {\mathbb {R}}^{d+1}\) called p-th regression quantile (Koenker and Bassett 1978). The left-hand side of (13) is called generalized inverse or quantile of \(F(\cdot |{\mathbf {x}})\) in \(p\in I\). It coincides with the usual inverse of a function, provided the inverse exists. Theoretical aspects and many applications of linear quantile regression are presented in Koenker (2005).

Let \((Y_i,{\mathbf {X}}_i)\), \(i=1,\ldots ,n\), denote independent copies of \((Y,{\mathbf {X}})\). Estimator

$${\hat{\varvec{\beta}}}_p=\underset{{\mathbf {b}}\in {\mathbb {R}}^{d+1}}{{{\text{arg}}}\,{{\text {min}}}}\;\sum _{i=1}^n\rho _p\left( Y_i-{\mathbf {X}}_i'\cdot {\mathbf {b}}\right)$$

with \(\rho _p(y)=y\cdot (p-{\mathbf {1}}_{\{y\le 0\}})\) is called empirical regression quantile. The following result establishes asymptotic normality of \(\sqrt{n}\left(\hat{\varvec{{{\beta}}}}_p-{\varvec{\beta }}_p\right)\) uniformly in \(p\in I\), i.e., in the function space \(\left( \ell ^\infty (I)\right) ^{d+1}\) (van der Vaart and Wellner 1996).

Theorem 1

Suppose that, uniformly in \({{\mathbf {x}}}\in {\mathscr {X}}\) , the conditional density \(f(y|{\mathbf {x}})\) exists, is bounded and uniformly continuous in y. Suppose further that \({\mathbb {E}}\Vert {\mathbf {X}}\Vert ^{2+\delta }<\infty\) for some \(\delta >0\) and that

$$J={\mathbb {E}}\left[ {\mathbf {X}}{\mathbf {X}}'\right] \,{\text { and }}\,H_p={\mathbb {E}}\left[ {\mathbf {X}}{\mathbf {X}}'\cdot f(F^{-1}(p|{\mathbf {X}})|{\mathbf {X}})\right]$$
(14)

exist with \(H_p\) positive definite for all \(p\in I\). Then, for \(n\rightarrow \infty\) , we have that

$$\left( H_p\sqrt{n}\left( {\hat{{\varvec{\beta }}}}_p-{\varvec{\beta }}_p\right) \right) _{p\in I}\xrightarrow{D}{\mathbb {Z}}$$
(15)

in \(\left( \ell ^\infty (I))\right) ^{d+1}\) , where \({\mathbb {Z}}\) is a centered Gaussian process with \({\mathbb {E}}[{\mathbb {Z}}(p){\mathbb {Z}}(q)']=(p\wedge q-p\cdot q)\cdot J\).

The previous result allows us to estimate the joint distribution of several empirical regression quantiles. Let \({\mathbf {p}}=\{p_1,\ldots ,p_\ell \}\subset I\) denote a set of probabilities. Then, for \(n\rightarrow \infty\) , we immediately obtain that

$$\sqrt{n}\left( {\hat{\varvec{\beta}}}_{p_1}-\varvec{\beta }_{p_1},\ldots ,{\hat{\varvec{\beta}}}_{p_\ell }-{\varvec{\beta }}_{p_\ell }\right) '{\mathop {\longrightarrow }\limits^{D}} {\mathscr {N}}\left( 0,\Sigma _{{\mathbf {p}}}\right) ,$$

where \(\Sigma _{{\mathbf {p}}}\) is defined piecewise through

$$\lim _{n\rightarrow \infty }{{{\mathrm{Cov}}}}\left[ \sqrt{n}\left( \hat{{\varvec{\beta }}}_{p_i}-{\varvec{\beta }}_{p_i}\right) ,\,\sqrt{n}\left( \hat{{\varvec{\beta }}}_{p_j}-{\varvec{\beta }}_{p_j}\right) \right] =(p_i\wedge p_j-p_i\cdot p_j)\cdot H_{p_i}^{-1}JH_{p_j}^{-1}.$$

This result is used to prove Proposition 1.

Conditional heavy-tail behavior: competing methods

1.1 Tail index regression (TIR) by Wang and Tsai (2009)

Wang and Tsai (2009) study model (3) with \(\alpha ({\mathbf {x}})=1/\gamma ({\mathbf {x}})=\exp ({\mathbf {x}}'\theta )\) for some unknown parameter vector \(\theta \in {\mathbb {R}}^{d+1}\). They propose the estimator

$$\hat{\theta }_{u_n}=\underset{\theta \in {\mathbb {R}}^{d+1}}{{{\text {arg}}}\,{{\text {min}}}}\;\sum _{i=1}^n\left[ \exp ({\mathbf {X}}_i'\theta )\cdot \log (Y_i/u_n)-{\mathbf {X}}_i'\theta \right] \cdot {\mathbf {1}}(Y_i>u_n)$$
(16)

with regressor independent threshold \(u_n\rightarrow \infty\) for \(n\rightarrow \infty\). (16) can be viewed as an approximate maximum likelihood approach based on the weak approximation of \(\log (Y/u_n)\) given \({\mathbf {X}}={\mathbf {x}}\) and \(Y>u_n\) to an exponential distribution with mean \(1/\alpha ({\mathbf {x}})\). Let \(k=\sum _{i=1}^n{\mathbf {1}}(Y_i>u_n)\) be the effective sample size in (16) and \(\hat{\Sigma }_{u_n}=\frac{1}{k}\sum _{i=1}^\mathbf {X}n{}_i{\mathbf {X}}_i'{\mathbf {1}}(Y_i>u_n)\). Under certain technical assumptions, Wang and Tsai (2009) prove

$$\sqrt{k}\cdot \hat{\Sigma }_{u_n}^{1/2}\cdot \left( \hat{{\varvec{\theta }}}-{\varvec{\theta }}\right) {\mathop {\longrightarrow }\limits^{D}} {\mathscr {N}}\left( {\mathbf {h}},I_{d+1}\right)$$
(17)

for some vector \({\mathbf {h}}\) and \((d+1)\)-dimensional identity matrix \(I_{d+1}\). The estimation of the bias \({\mathbf {h}}\) requires detailed information on the tail, which is hardly available and thus set to zero in applications.

However, Wang and Tsai (2009) do not consider regressor dependent thresholds \(u_n\) like in Sect. 2.1, which in practice is important to account for regression effects in e.g. the center of the distribution. In order to reduce this problem, we suggest to apply their estimation procedure on the sample \((Z_{k,j},{\mathbf {X}}_{k,j})\), \(j=1,\ldots ,k\), as given in Sect. 2.1. That is, replace \(\hat{\theta }_{u_n}\) by

$$\hat{{\varvec{\theta }}}_{k,n}^{TIR}=\underset{{\varvec{\theta }}\in {\mathbb {R}}^{d+1}}{{{\text {arg}}}\,{{\text {min}}}}\;\sum _{j=1}^k\left[ \exp ({\mathbf {X}}_{k,j}'{\varvec{\theta }})\cdot \log (Z_{k,j})-{\mathbf {X}}_{k,j}'{\varvec{\theta }}\right]$$

and \(\hat{\Sigma }_{u_n}\) by \(\hat{\Sigma }_{k,n}=\frac{1}{k}\sum _{j=1}^k{\mathbf {X}}_{k,j}{\mathbf {X}}_{k,j}'\).

1.2 Three-stage procedure by Wang and Li (2013)

An alternative regression approach focusing on high conditional quantiles \(F^{-1}_Y(p|\ {\mathbf {x}})\), \(p\in [1-\varepsilon ,1)\), for some small number \(\varepsilon >0\) is proposed in Wang and Li (2013). Their method is based on the assumption that

$$F^{-1}_{g_\lambda (Y)}(p|\ {\mathbf {x}})={\mathbf {x}}'\beta _p$$

holds for some \(\lambda \in {\mathbb {R}}\), Box-Cox transformation \(g_\lambda\), regression quantiles \(\beta _p\mathbb {R}in {}^{d+1}\) and all \(p\in [1-\varepsilon ,1)\). They propose an estimator of \(\gamma ({\mathbf {x}})\) based on a three-stage procedure:

  1. 1.

    Set \(p=p_{k,n}=\frac{n-k}{n+1}\) and compute \(\hat{\lambda }\) as in Sect. 2.1.

  2. 2.

    Let \(p_{n-j,n}=\frac{j}{n+1}\) for \(j=1,\ldots ,m\) with \(m=n-\lfloor {n}^{\eta} \rfloor\) and \(\eta =0.1\). For \(j=1,\ldots ,m\), estimate \(F_Y^{-1}(p_{n-j,n}|\ {\mathbf {x}})\) by the right hand side of (6) with \(g=g_{\hat{\lambda }}\) and \(p=p_{n-j,n}\). Denote these estimates by \(\hat{q}_j({\mathbf {x}})\), \(j=1,\ldots ,m\). If \(\hat{q}_j({\mathbf {x}})\) is not increasing in j, apply the rearrangement procedure of Chernozhukov et al. (2010).

  3. 3.

    For some integer \(k<m\), estimate \(\gamma ({\mathbf {x}})\) by

    $$\hat{\gamma }_{k,n}({\mathbf {x}})=\frac{1}{k-\lfloor n^\eta \rfloor }\sum _{j=\lfloor n^\eta \rfloor }^k\log (\hat{q}_{n-j})-\log (\hat{q}_{n-k}).$$

Thus \(\hat{\gamma }_{k,\mathbf {x}n}({})\) is Hill’s estimator (Hill 1975) applied to the sample of \(\hat{q}({\mathbf {x}})\) values, which can be seen as pseudo observations from \(F_Y(\ \cdot \ |\ {\mathbf {x}})\). Wang and Li (2013) also propose a test statistic

$$T_n=\frac{1}{n}\sum _{i=1}^n\left( \hat{\gamma }_{k,n}({\mathbf {X}}_i)-\hat{\gamma }_p\right) ^2,\ \hat{\gamma }_p=\frac{1}{n}\sum _{i=1}^n\hat{\gamma }({\mathbf {X}}_i),$$
(18)

as a test for hypothesis \({\mathscr {H}}_{0,tail}\) in (4). If \({\mathscr {H}}_{0,tail}\), \(E({\mathbf {X}})=(1,0,\ldots ,0)'\in {\mathbb {R}}^{d+1}\) and either \(\gamma ^*({\mathbf {x}})=0\) or a certain homogeneity assumption are met, Wang and Li (2013) show under additional technical assumptions that \(kT_n\xrightarrow{D}\gamma ^2\chi _d^2\) holds. They also derive the limiting distribution under heterogeneity, which in practice involves the estimation of additional parameters. For more details we refer to Wang and Li (2013, Th. 3.3 and Cor. 3.1).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kinsvater, P., Fried, R. Conditional heavy-tail behavior with applications to precipitation and river flow extremes. Stoch Environ Res Risk Assess 31, 1155–1169 (2017). https://doi.org/10.1007/s00477-016-1345-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-016-1345-0

Keywords

Navigation